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9 Proposal Title 

10 Multi-channel Quiet Calls, a method and system for interleaving communication amongst multiple voice 

1 1 paths. 

] 2 Brief Description 

13 This invention builds on the concepts of mixed-mode synchronous communications first introduced in 

14 Quiet Calls. Quiet Calls allow mobile telephone users to respond to telephone conversation without talking 

15 aloud. A person selects what to say from a non-vocal phone interface and the corresponding audio is 

1 6 inserted into the communication voice path. We extend Quiet Calls here by applying the technique over 

17 multiple voice paths, instead of a single call. Unlike automated call processing systems that select a voice 

18 response prior to an incoming call (even when adapted to caller id), with Multi-channel Quiet Calls the 

19 selection of voice phrases occurs synchronously and with one or more possible recipients. The ability to 

20 deal concurrently with multiple voice paths has use in a number of situations, including: responses for 

21 callers on a call waiting line; responses for callers on hold; selective dispatch over multiple radio channels 

22 such as taxicabs, security guards, and other distributed command and control situations. Incoming audio 

23 channels may be mixed or kept separate. Outgoing messages may be received in a point-to-point, multi- 

24 cast or broadcast manner. Interfaces are described for phrase and recipient selection. 
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Description of Invention 



2 Introduction 

3 Quiet Calls allow mobile telephone users to respond to telephone conversation without talking aloud. A 

4 person using Quiet Calls selects what to say from a non-vocal phone interface (e.g., tap, press, click). Pre- 

5 recorded or synthesized voice for the selection is then silently introduced into the telephone voice path. 

6 The method extends to all form of voice communication, including lanctline telephone, cell phone, Internet 

7 phone, videophone, two-way radio, CB radio, intercom, etc. 

8 The Quiet Calls method was described for communicating across one voice path by using a non-vocal 

9 interface. 

10 Motivation and Uses 

1 1 In Quiet Calls we investigated the need to simultaneously addressing the voice needs of both a phone call 

1 2 and a local situation (e.g., meeting room, waiting area). Features like Quiet Calls and Call Waiting come 

1 3 about due to the need to interact with different people about different things at ihe same time. 

14 Common situations where the need for multiple channels of synchronous audio communication arise 

15 include: 

16 • Call Waiting 

1 7 When a call comes in the recipient is given the choice of putting the ongoing conversation on hold 

1 8 and switching to the other line. Typical responses in this situation are "I'm on the other line, can 

19 you wait for a minute?" or 'Tm on the other line, I'll call you back". For example, the following 

20 figure shows one of the many possible conversations that may be conducted in this way. 




need to be going now. 
we done? 



"Yes. I'll see you 
tomorrow. Bye" 

VlcQC: Hi. I'm on the 
ither line. I'll be right there. 




"Okay..." 



"Hi, I'm here..." 
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New call goes 
to Caff Waiting 



First caff ends 



Timeline of Multi-Channel Conversation over Call Waiting 
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! • Call Hold 

2 When numerous communications (e.g., calls, face-to-face conversations) are arriving at one place 

3 (e.g., a reception desk), all but one channel is quickly placed on hold (e.g., "Xerox Lobby. Can 

4 you hold?"). A receptionist on the phone may visually signal arriving persons that they will be 

5 attending to in a moment. 

6 • Voice Dispatching 

7 In voice dispatching (e.g., taxicab) assignment, a large amount of information is quickly broadcast 

8 to a number of recipients. Typically, all communications are preceded by an identifying call 

9 handle (e.g., "Car 54") and a brief message that requires acknowledgement (e.g., "Car 54. Roger 

10 that"). 

1 1 • Human Oversight of Automated Media Systems 

12 User who wander alone through an online system such as voice response system (e.g., banking or 

13 reservations systems) can sometimes become lost or frustrated. A system might allow users to 

14 stay on the line for the next operator. Alternatively, a Multi-Channel Quiet Calls approach would 

15 be to allow a human operator to oversee (e.g., listen in, follow along a graphical representation of 

16 the user's navigation) to many channels at once. When one channel appears to be going astray 

17 (e.g., suspicious restarts of the voice response systems, verbal cries for help), the operator could 

18 direct verbal assistance into the line through Multi-Channel Quiet Calls or engage the line 

19 verbally if necessary. A similar arrangement could be made for online tutorial systems. 

20 As we investigated Quiet Calls, the possibility of applying this non-vocal conversation technique to handle 

21 other simultaneous communication demands presented itself. Just as the receptionist may hold up an 

22 index finger to indicated attention is to be expected 'in one moment', a Quiet Calls response could be 

23 delivered on one line, while conversing on another. 

24 Multi-channel Quiet Call Capabilities 

25 The Multi-channel Quiet Call approach for a public conversational technology as detailed in this proposal 

26 has the following features applicable for both placing and receiving calls: 

27 • Conversational phrases are directed to one or more communicating parties. 

28 Non-audible input operations (pressing a key or button, touching a display) are translated into 

29 appropriate audio conversation signals on the one or more selected channels. The user specifies 

30 the phrase and the channel for that phrase. 

3 1 • One conversation may be conducted audibly. 

32 Only the participants in multiple-contact situations need change their communication mode. Other 

33 callers participate as in any phone call. 

34 • The user may listen to one channel at a t ime or may mix the audio from several channels. 

35 In many person -to-pei son phone calls, a caller must pay strict attention to one channel of 

36 information. There are other circumstances where it may be possible to mix the incoming audio 

37 from multiple channels of information. For example, if the caller is on hold waiting for a service 

38 call, it may be possible to converse with another party. When a service representative all of a 
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sudden takes the call, the conversational needs quickly change. A Quiet Calls approach addresses 
the simultaneous conversation needs by directing a phrase at the service representative that s/he 
will gel immediate attention and directing a phrase at the third party that the conversation needs 
to be concluded. 



The conversation permitted is expressive 

Users of Quiet Calls identified a number of useful phrases relating to future contact, redirecting 
calls, and simple responses. Conversation representations may be predefined, recorded as needed, 



Making future contact 


Directing call 
elsewhere 


Responding simply 


I'll get back to you later. 


Send me an email 


Yes 


We'll talk when 1 get 
back to the office. 


Leave a message and I'll 
get back to you. 


No 


I'll call you later 


Please leave a message 


Maybe 


Give me 5 minutes, I'll 
call you back 


Leave a voice message, 
at home or at work. 


I agree 


Is there anything else 
you need to say 


Fax it to me 


I disagree 


Some acknowledgement 
like "1 heard what you 
said" 


Can you call me later 


Say that again 


I'll call you back as soon 
as I'm free 


Direct caller elsewhere 


Now's not a good time to 
talk 


I'll call you back in 5 
minutes or 10 minutes 


Be more explicit about 
going someplace where 
he can talk 


I'm tied up right now. 


Please hold, I'll be with 
you in a minute 


I'm walking out now 


Okay, bye 



9 • The communic ation interface is easy to use when a user is engaged in other activities. 

10 The interface represents available conversation utterances so that they may be easy to recognize 

1 1 (e.g., icon, text label) and invoke (e.g., point and click). One input selection (e.g., button press) 

12 may invoke a possibly complex sequence of responses supporting the dialogue (e.g., putting a 

1 3 person politely on hold or terminating the conversation). 
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1 • Audio hardware may be activated at any point in the voice call path 

2 Phrases may be stored or played back in the handset, as an accessory, in the call channel, or in a 

3 shared processor available to multiple call channels 

4 * Audio hardware on one channel may he activated and even if the caller is actively engaged on 

5 another 

6 Phrase selections may be communicated to audio hardware on one channel either by messaging, 

7 (e.g., over computer network) or by temporary connection to a channel (e.g., hook-switch, 

8 selection delivery such as touchtone commands, and return to the original channel). 

9 
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1 Technical Details 

2 A multi-channel quiet call conversation as described here is a electronically assisted discussion (e.g.. a 

3 phone call) being held between two or more parties that has the following attributes: 

4 • The conversation is being expressed at least in part vocally (e.g., via telephone, cell phone, 

5 Internet phone, videophone, two-way radio, intercom, etc.). 

6 • One or more parties in the conversation are located in a situation where multiple people must be 

7 engaged (e.g., call hold, call waiting, message dispatch, etc.). 

8 • Consequently, one or more parties in the discussion uses an alternative, non-vocal mode of 

9 discussion (e.g., keyboard, buttons, touchscreen, etc.) for the audible content of the discussion 

10 that is transformed into an equivalent audible electronic representation on selected audio 

1 1 channels. 

12 An architecture for multi-channel quiet-mode conversation consists of the components shown in the 

1 3 following figures. Two modes of operation are defined: conducting and preparing a call. 

14 Conducting a Multi-Channel Quiet Call. 

1 5 In this mode, a user conducts a voice conversation while at the same time engaging audibly on another 

1 6 channel with no audible feedback into other channels. The capabilities required for supporting this mode 

1 7 of communication are shown in the following architecture: 



18 
19 

20 

21 
22 
23 
24 




3hannel-Usei 
Connector 



Phrase 
Selection 



Channel 
Selection 



Audio Store 
& Playback 




Ujdio-Channe 
Connector 







Audio 
Input 





System Mode: Conducting a Conversation 



A user views a Conversation Representation and makes phrase selections about utterances to be voiced 
over the communication channel. When selected, phrases are retrieved and audible output signals are 
produced for the communication line. An Audio to Channel Connector provides this electrical connection. 
A Channel to User Connector allows the user to hear both the conversation generated by the system and 
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other callers. A switchabie Audio Input (e.g., microphone) allows the user to voice directly into the 
channel when appropriate. 

A user is given a choice of what channels receive the selected phrases. The following figures describe the 
channel selection for the Call Waiting, Call Hold, and Dispatch situations highlighted above. The channel 
selection is accomplished by accepting the user's channel choices, including menu selection, handset 
switches or buttons, and other selection means known in the art. 




Method for Multi-channel phrase and channel selection for Call Waiting and Call Hold 

situations 



Dispatch ) 



User conducting multiple 
communications over a 
communication medium 




Method for Multi-channel phrase and channel selection for Dispatch situations 



4 Preparing Quiet Call Conversation Structures. 

5 In this mode the user prepares for a non-vocal conversation by adding, deleting, or modifying conversation 

6 structures (representations and data storage) and channel selections held within the system. Preparing for 

7 phrase selection is very much as described in the Quiet Calls system. Updating the channel selection is 

8 accomplished by presenting the user with channel selection choices, including menus, handset switches or 

9 buttons, and other user interface means known in the art. 



10 Multi-channel Quiet Calls Embodiments 



1 1 In a multi-channel quiet mode conversation, all sides of the conversation use a common communication 

12 mechanism (e.g., telephone infrastructure). But the person with the need to converse with several others 

1 3 simultaneously would have a special interface for responding to these other conversation. 

14 Three basic arrangements of this principle are described here: 

15 ( 1 ) Separate channel selection. Each parly-party communication may occur on a different 

16 communication channel (e.g., telephone voice paths, different radio frequency). In this case, 

1 7 dealing with these calls at the same time while only being a able to access one at a time requires 

1 8 that the user switch between channels. The user responds to each individually and moves between 

19 them (e.g., call waiting/call hold situations described above). The user selects which channel to 

20 attend to and then selects which phrase to apply on that channel. The audio generating hardware 

21 must be accessible to that channel (e.g., in the handset, as an accessory attached to the handset, or 

22 linked into the communication hardware for the voice path). 
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1 One variation on this notion allows the user to stay in direct audible contact with one channel and 

2 trigger phrases in the other. This may be accomplished by either 

3 (a) momentarily switching channels, issuing the phrase selection command to that channel's 

4 hardware, and switching back to the first channel; 

5 (b) dispatching phrase selection commands to another channel by some other connection 

6 (e.g., over a computer network to the corresponding control processors). 

7 (2) Mixed channel selection. In this arrangement, the audio input to the Multi-channel Quiet Calls 

8 user is the product of mixing several audio streams for different channels. The audio output is 

9 kept separate for each active channel and the user may reply on other channels in the same way as 

10 with separate channel selection, above. Where the communication infrastructure does not directly 

1 1 support such audio mixing, a bridging approach may be implemented, as shown in the following 

12 figure. A telephony processor connects (e.g., conference call) to all open channels. Channel 

13 selection is made through commands sent to the telephony processor (e.g., switch channels, mix 

14 audio channels, answer, hang-up) via a separate data network. The telephony processor mixes the 

1 5 selected channels and streams the audio only to the Multi-channel Quiet Calls user (e.g., direct 

16 electrical connection to the user's earpiece). 

17 




18 

19 Bridge configuration for mixing audio channels to the user 

20 (3) Broadcast selection. Some communication means use a single broadcast medium (e.g., radio 

21 broadcast for dispatchers). Typically, each communication is identified by source of message, 
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1 destination, and message content. Channel selection in this case amounts to having the Multi- 

2 Channel Quiet Calls system generate the audio preamble (source and destination identifiers) 

3 based on user selection (e.g., as defaults or explicit selection). 
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1 Novelty 

2 No other system is known that has been designed for conducting an expressive conversation non-vocally in 

3 a multi-channel environment. A number of systems contain elements similar to those described in parts of 

4 this proposal. These arc described below. 

5 Novel features of Multi-channel Quiet Calls that enable parts of conversations to be held across multiple 

6 communication channels include the following: 

7 • Conversational phrases are directed to one or more communicating parties. 

8 • One conversation may be conducted audibly. 

9 • The user may listen to one channel at a time or may mix the audio from several channels. 

10 • The conversation permitted is expressive 

1 1 • The communication interface is easy to use when a user is engaged in other activities. 

12 • Audio hardware may be activated at any point in the voice call path 

13 • Audio hardware on one channel may be activated and even if the caller is actively engaged on 

14 another 

15 
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Summary of Related Work 



2 This invention builds on the previous Quiet Calls invention, filed 13 September 2000. Multi-channel quiet 

3 calls adds the ability to select from among different channels, re -direct voice phrases to those channels, 

4 and choose to mix or keep audio separate between the channels. 

5 The closest system we have found to Multi-channel Quiet Calls is the system currently in use for directory 

6 assistance. An operator gets the city and name from a caller seeking a number. The operator looks up the 

7 response and then queues a text-to-speech answer and departs the conversation, presumably moving on to 

8 another call. This systems does allow communication on two channels, but not in a bi-directional, 

9 synchronous way. When the operator queues the message, the call terminates for the operator. 

10 Other audio generation techniques include the following. 

1 1 Phone-Audio Interaction 

12 The following two systems provide for interactive phone and recorded audio use. These do not give the 

1 3 cell phone user a quiet conversational capability, but they do support similar arrangements of hardware 

14 that is required for Quiet Calls. 

15 U.S. Patent Number 5790957 from in cellular telephone. An apparatus permits storage of a message 

16 originating locally from a user of a cellular telephone, inputted via a phone's microphone, or from a 

17 another caller through the phones receiving channel. Subsequent playback to a speaker of the telephone to 

1 8 be heard by a user of the telephone or to the distant telephone. This enables the telephone to provide 

19 features of prompt, voice pad, transcription, and voice mail. 

20 A sound effects device is available from www.shopvovager.com (Phone-Fun Special Effects Machine, 

21 #TE2200) that plugs into a wired phone or answering machine and generates 10 sound effects (doorbell, 

22 street noise, etc.) into the phone. The sound effects are fixed at time of manufacture, except a voice 

23 altering circuit is included to disguise one's voice. 

24 Text to Speech Synthesis 

25 Text to Speech (TtS) conversion vocalizes typed text with a synthetically generated voice. TtS toolkits are 

26 available from many sources, including but not limited to the references given below. Applications of TtS 

27 include: vocalized email and other documents, files, or database entries; conversational character speech 

28 generation (i.e., interactive conversation with a synthetic speaker or other software program); multilingual 

29 translation (i.e., type in one language, voice in another); and voicing typed in phrases for people with 

30 disabilities. 

31 TalkToMe! is a communications tool for people who can no longer speak (e.g., as a result of 

32 neurodegenerative diseases). It uses Apple's Text-To- Speech capability to simulate speech, and allows you 

33 to store common words and phrases for two-click speaking (one click to select the phrase, one click to 

34 speak it). You can type the text into the text field, paste it in, or use an on-screen keypad to "click" it in. 

35 Once entered, you can "speak" the text by hitting return or clicking the Speak button. If you would like to 

36 save the phrase, you may store it in any of your user-defined categories. TalkToMe! comes with 6 

37 categories, which may be modified to suit individual tastes. They are: Bible Verses, Common Words, 

38 Jokes, Poems. Prayers, and Sayings. Additional categories may be added up to the limit of 100. 
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1 Devices have been created for add-on speech synthesis capability. For example, MultiVoice is a 

2 lightweight and portable speech synthesizer with text-to- speech capabilities that connecis to a computer 

3 RS-232 serial interface. 

4 A number of speech to text software toolkits for PC's and other computers are available (See References). 

5 Text to Speech conversion is one possible background technology for mixed-mode conversation as 

6 described in this Invention Proposal, and, in particular, for the Audio Generator capability. TtS by itself 

7 requires significant typing for the user and hence may not be appropriate for many public usages (e.g., 

8 typing noise and level of effort required for typing-only solutions). TtS provides only a synthetic voice that 

9 is recognizably artificial and does not sound like the intended recipient of the call. In ihe PC embodiment 

10 described previously, the TtS capability is a back up capability for handling conversational situations that 

1 1 are the exception rather than the norm. Predefined versions of conversational templates are not stored. 

12 Other Multimodal Speech Synthesis 

13 Most of these systems compensate for loss of various sensory abilities by converting between vocal speech 

14 and other modes of communication. These systems are designed for a cross modal transformation of input 

1 5 lo support an ongoing dialogue. However, they do not do so in a manner easy and appropriate for users 

16 with fully able hearing. 



iSystem [Description 



Minspeak 



Semantic Compaction (Minspeak) uses pictures (icons) to express 
conversaton using a small set of symbol. It does this by assigning more than 
one meaning to each icon and then sequencing them together to produce those 
different meanings. Each icon has a primary (picture producer) meaning and 
several secondary meanings. A number of Minspeak keyboards have been 
Ideveloped for people with speech disabilities. Synthetic voices are used based 
Ion Text to Speech technology (above). 



Visual/tactile iBraille cell displays are connected to a PC port and the text string is 
speech [converted lo grade level 1 braille. Six solenoids which raise braille pins. 

generation | j e j e f ace a telephone communication aid for the hard of hearing generates a 
jsynthetic face that articulates in synchrony with the telephone speech. 

; j 

[Note these systems silently deliver speech, but there is no mechanism to 
jsilently reply. 



1 7 Inlermodal translation concepts expressed in different modalities & translate between them (e.g., synthetic 

1 8 speech, Braille display, writing to sign language, sound / text information to spatial, non-verbal expression 

19 of questions, requests 
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Ifeatures 



^Supports quiel 
^conversation for 
; callers in public 



Quiet Nokia jSound ;TtS 
|CaH speech Effect 



^Permits audible 
^conversation for 
lother callers 



The V 

jcommunication 

iinierface is easy 

| to use when a 

! user is engaged in j 

jother activities 

i F7 

iThe N 
communication 
interface is 
situation- 
appropriate I 

|The system works IV 
jwilh the existing j 
'communication 
^infrastructure j 



jThe conversation jV 
[permitted is j 



expressive 



recall 



iGnly 
isingle 
: prompt 
; scenario 
jsupporte 
Id 



No 



iNo 



jOnly for jV 
{single 
j prompt j 
! scenario I 



V 

TalkToMe 



Minspeak 



Virtual 

Assistants 



V 

TalkToMe 



'Multimodal j 
Interaction j 



i 



No 



(TalkToMe TalkToMe 

I I 



No 



!No 



No, 

requires 
training 



Only PC 
supported 



No 



ITalkToMe 



No 



V jV ;No 

TalkToMe ITalkToMe i 



No 



No 



No 



No 



2 Status 

3 The bridge configuration embodiment uses much of the same technology as the Telephony Processor- 

4 based Quiet Calls prototype described in patent filing FXPAL-1P-00-005E. There is still a need to conduct 

5 user-centered design for creating specific interfaces for the scenarios described here. 
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Quiet Call number 

High 4241521 Multi-symbol message communicator for a speechless, handicapped person 

High 5790957 Speech recall in cellular telephone 

Med 4661 91 6 System for method for producing synthetic plural word messages 

Med 5210689 System and method for automatically selecting among a plurality of input 

modes 

Med 5297041 Predictive scanning input system for rapid selection of auditory and visual 

indicators 

Med 5920303 Dynamic keyboard and method for dynamically redefining keys on a 

keyboard 

Low 45 1 5995 Telephone answering machine with apparatus for selecting particular 
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Low 4591664 Multichannel interactive telephone answering apparatus 
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Relevance to US Patent Title 



Quiet Call number 

1 x>w 47 1 5060 Door message apparatus with telephone answering device 

Low 498591 3 Multiple message answering machine keyed to the incoming phone number 

Low 5029214 Electronic speech control apparatus and methods 

Low 5259024 Telephone answering service with integrated voice and textual message 

storage 

Low 5668868 Memorandum recorder for use with a telephone 

Low 5822403 Automated telephone hold device 

Low 5991 374 Programmable messaging system for controlling playback of messages on 

remote music on-hold- compatible telephone systems and other message 
output devices 



1 Has invention been built, made, run, or tested? 

2 The bridge configuration embodiment uses much of the same technology as the Telephony Processor- 

3 based Quiet Calls prototype described in patent filing FXPAL-IP-00-005E. 

4 Is the invention used in a current product(s) or planned for use in a future 

5 product(s)? 

6 We are currently pursuing productization of Quiet Calls (e.g., licensing in different telecommunication 

7 market segments). The product concept for Multi-channel Quiet Call is currently being investigated. This 

8 invention represents more intellectual property in that patent portfolio. 

9 Dates of any previous or planned future disclosures external to Xerox 

10 A submission for CHI 2002 is currently planned with a paper submission date in September 2001 . 

1 1 Source of outside funding 

12 None. 
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